Tolerating Node Failures in Cache Only Memory Architectures Michel Banâtre, Alain Gefflaut, Christine Morin

نویسندگان

Michel Banâtre

Christine Morin

چکیده

COMAs (Cache Only Memory Architectures) are an interesting class of large scale shared memory multiprocessors. They extend the concepts of cache memories and shared virtual memory by using the local memories of the nodes as large caches for a single shared address space. Due to their large number of components , these architectures are particularly susceptible to hardware failures and so fault tolerance mechanisms have to be introduced to ensure a high availability. In this paper, we propose an implementation of backward error recovery in a COMA which minimizes performance degradation and requires little hardware modiica-tions. This implementation uses the features of a COMA to implement a stable storage abstraction using the standard memories of the architecture. Recovery data are replicated and mixed with current data in node memories both of which are managed in a transparent way using an extended coherence protocol. (URA 227) Université de Rennes 1 – Insa de Rennes et en Automatique – unité de recherche de Rennes Proposition d'une architecture extensible COMA tol erant les d efaillances de nnuds R esum e : Les architectures COMA (Cache Only Memory Architectures) sont une classe int eressante des architectures multiprocesseurs extensibles a m emoire parta-g ee. Elles etendent les concepts de m emoires cache et de m emoire virtuelle partag ee par l'utilisation des m emoires locales des nnuds comme caches de grande taille d'un espace d'adressage partag e unique. Compte tenu de leur grand nombre de compo-sants, ces architectures sont particuli erement sujettes aux d efaillances mat erielles rendant n ecessaire l'introduction de m ecanismes de tol erance aux fautes pour ga-rantir une haute disponibilit e. Dans cet article, nous proposons la mise en uvre d'un m ecanisme de retour arri ere dans une architecture COMA qui minimise la d egradation des performances et requiert peu de modiications mat erielles. Cette mise en uvre tire proot des carat eristiques inh erentes aux architectures COMA pour oorir une abstraction de m emoire stable en utilisant les m emoires standard de l'architecture. Les donn ees de r ecup eration sont r epliqu ees et conserv ees avec les donn ees courantes dans les m emoires des nnuds. Les deux types de donn ees sont g er es de faa con transparente par un protocole de coh erence etendu.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Architecture for Tolerating Processor Failures in Shared Memory Multiprocessors

In this paper, we focus on the problem of recovering processor failures in shared memory multiprocessors. We propose an architecture designed for transparently tolerating processor failures. The Recoverable Shared Memory (RSM) is the main component of this architecture which provides a hardware supported backward error recovery mechanism. This technique copes with standard caches and cache cohe...

متن کامل

An Efficient and Scalable Approach for Implementing Fault-Tolerant DSM Architectures

Distributed Shared Memory (dsm) architectures are attractive to execute high performance parallel applications. Made up of a large number of components , these architectures have however a high probability of failure. We propose a protocol to tolerate node failures in two classes of dsm architectures: Cache Only Memory Architectures (coma) and Distributed Virtual Shared Memory (svm) systems. Th...

متن کامل

A Recoverable Distributed Shared Memory Integrating Coherence and Recoverability

Large-scale distributed systems are very attractive for the execution of parallel applications requiring a huge computing power. However, their high probability of site failure is unacceptable, especially for long time running applications. In this paper, we address this problem and propose a checkpointing mechanism relying on a recoverable distributed shared memory (DSM). Although most recover...

متن کامل

7 Related Work

In modern processors, the dynamic translation of virtual addresses to support virtual memory is done before or in parallel with the first-level cache access. As processor technology improves at a rapid pace and the working sets of new applications grow insatiably the latency and bandwidth demands on the TLB (Translation Lookaside Buffer) are getting more and more difficult to meet. The situatio...

متن کامل

Cache-Only Memory Architectures

72 Computer S calable shared-memory multiprocessors are emerging as attractive platforms for applications with high-performance demands. What makes these machines attractive is the shared address space, which allows processors in a multiprocessor to share data the same way it is shared by multiple processes in a sequential machine. The shared-memory paradigm makes it easier to write parallel pr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1994

Tolerating Node Failures in Cache Only Memory Architectures Michel Banâtre, Alain Gefflaut, Christine Morin

نویسندگان

چکیده

منابع مشابه

An Architecture for Tolerating Processor Failures in Shared Memory Multiprocessors

An Efficient and Scalable Approach for Implementing Fault-Tolerant DSM Architectures

A Recoverable Distributed Shared Memory Integrating Coherence and Recoverability

7 Related Work

Cache-Only Memory Architectures

عنوان ژورنال:

اشتراک گذاری